bandwidth allocation
Accelerating Wireless Distributed Learning via Hybrid Split and Federated Learning Optimization
Guo, Kun, Li, Xuefei, Wang, Xijun, Yang, Howard H., Feng, Wei, Quek, Tony Q. S.
Federated learning (FL) and split learning (SL) are two effective distributed learning paradigms in wireless networks, enabling collaborative model training across mobile devices without sharing raw data. While FL supports low-latency parallel training, it may converge to less accurate model. In contrast, SL achieves higher accuracy through sequential training but suffers from increased delay. To leverage the advantages of both, hybrid split and federated learning (HSFL) allows some devices to operate in FL mode and others in SL mode. This paper aims to accelerate HSFL by addressing three key questions: 1) How does learning mode selection affect overall learning performance? 2) How does it interact with batch size? 3) How can these hyperparameters be jointly optimized alongside communication and computational resources to reduce overall learning delay? We first analyze convergence, revealing the interplay between learning mode and batch size. Next, we formulate a delay minimization problem and propose a two-stage solution: a block coordinate descent method for a relaxed problem to obtain a locally optimal solution, followed by a rounding algorithm to recover integer batch sizes with near-optimal performance. Experimental results demonstrate that our approach significantly accelerates convergence to the target accuracy compared to existing methods.
FairEnergy: Contribution-Based Fairness meets Energy Efficiency in Federated Learning
Marnissi, Ouiame, Hammouti, Hajar EL, Bergou, El Houcine
Abstract--Federated learning (FL) enables collaborative model training across distributed devices while preserving data privacy. However, balancing energy efficiency and fair participation while ensuring high model accuracy remains challenging in wireless edge systems due to heterogeneous resources, unequal client contributions, and limited communication capacity. T o address these challenges, we propose FairEnergy, a fairness-aware energy minimization framework that integrates a contribution score capturing both the magnitude of updates and their compression ratio into the joint optimization of device selection, bandwidth allocation, and compression level. The resulting mixed-integer non-convex problem is solved by relaxing binary selection variables and applying Lagrangian decomposition to handle global bandwidth coupling, followed by per-device subproblem optimization. Experiments on non-IID data show that FairEnergy achieves higher accuracy while reducing energy consumption by up to 79% compared to baseline strategies.
Deploying Large AI Models on Resource-Limited Devices with Split Federated Learning
Qiang, Xianke, Liu, Hongda, Zhang, Xinran, Chang, Zheng, Liang, Ying-Chang
Abstract--Large Artificial Intelligence Models (LAMs) powered by massive datasets, extensive parameter scales, and extensive computational resources, leading to significant transformations across various industries. Y et, their practical deployment on resource-limited mobile edge devices is hindered by critical challenges such as data privacy, constrained resources, and high overhead costs. Addressing this gap, this paper proposes a novel framework, named Quantized Split Federated Fine-T uning Large AI Model (SFLAM). By partitioning the training load between edge devices and servers using a split learning paradigm, SFLAM can facilitate the operation of large models on devices and significantly lowers the memory requirements on edge devices. Additionally, SFLAM incorporates quantization management, power control, and bandwidth allocation strategies to enhance training efficiency while concurrently reducing energy consumption and communication latency. A theoretical analysis exploring the latency-energy trade-off is presented, and the framework's efficacy is validated via comprehensive simulations. The findings indicate that SFLAM achieves superior performance in terms of learning efficiency and scalability compared to conventional methods, thereby providing a valuable approach for enabling advanced AI services in resource-constrained scenarios. I. Introduction The advent of Large AI Models (LAMs), such as Chat-GPT and DeepSeek, marked a significant leap in AI capabilities, powered by their extensive parameter scales, large-scale datasets, and substantial computational resources [1]. As user demand for ubiquitous AI access and real-time, personalized experiences grows, deploying and training these models on mobile devices becomes increasingly relevant [2]. T o meet these escalating demands, fine-tuning, which involves adapting pre-trained models with domain-specific data, has become a widely adopted and efficient strategy for enhancing LAM performance on specialized tasks, offering a cost-effective path to superior results.
PartialLoading: User Scheduling and Bandwidth Allocation for Parameter-sharing Edge Inference
Qu, Guanqiao, Chen, Qian, Chen, Xianhao, Huang, Kaibin, Fang, Yuguang
By provisioning inference offloading services, edge inference drives the rapid growth of AI applications at the network edge. However, achieving high task throughput with stringent latency requirements remains a significant challenge. To address this issue, we develop a parameter-sharing AI model loading (PartialLoading) framework for multi-user edge inference, which exploits two key insights: 1) the majority of latency arises from loading AI models into server GPU memory, and 2) different AI models can share a significant number of parameters, for which redundant loading should be avoided. Towards this end, we formulate a joint multi-user scheduling and spectrum bandwidth allocation problem to maximize task throughput by exploiting shared parameter blocks across models. The intuition is to judiciously schedule user requests to reuse the shared parameter blocks between consecutively loaded models, thereby reducing model loading time substantially. To facilitate solution finding, we decouple the problem into two sub-problems, i.e., user scheduling and bandwidth allocation, showing that solving them sequentially is equivalent to solving the original problem. Due to the NP-hardness of the problem, we first study an important special case called the "bottom-layer-sharing" case, where AI models share some bottom layers within clusters, and design a dynamic programming-based algorithm to obtain the optimal solution in polynomial time. For the general case, where shared parameter blocks appear at arbitrary positions within AI models, we propose a greedy heuristic to obtain the sub-optimal solution efficiently. Simulation results demonstrate that the proposed framework significantly improves task throughput under deadline constraints compared with user scheduling without exploiting parameter sharing.
Bandwidth Allocation for Cloud-Augmented Autonomous Driving
Schafhalter, Peter, Krentsel, Alexander, Gonzalez, Joseph E., Ratnasamy, Sylvia, Shenker, Scott, Stoica, Ion
Autonomous vehicle (AV) control systems increasingly rely on ML models for tasks such as perception and planning. Current practice is to run these models on the car's local hardware due to real-time latency constraints and reliability concerns, which limits model size and thus accuracy. Prior work has observed that we could augment current systems by running larger models in the cloud, relying on faster cloud runtimes to offset the cellular network latency. However, prior work does not account for an important practical constraint: limited cellular bandwidth. We show that, for typical bandwidth levels, proposed techniques for cloud-augmented AV models take too long to transfer data, thus mostly falling back to the on-car models and resulting in no accuracy improvement. In this work, we show that realizing cloud-augmented AV models requires intelligent use of this scarce bandwidth, i.e. carefully allocating bandwidth across tasks and providing multiple data compression and model options. We formulate this as a resource allocation problem to maximize car utility, and present our system \sysname which achieves an increase in average model accuracy by up to 15 percentage points on driving scenarios from the Waymo Open Dataset.
NeRFCom: Feature Transform Coding Meets Neural Radiance Field for Free-View 3D Scene Semantic Transmission
Yue, Weijie, Si, Zhongwei, Wu, Bolin, Wang, Sixian, Qin, Xiaoqi, Niu, Kai, Dai, Jincheng, Zhang, Ping
Abstract--We introduce NeRFCom, a novel communication system designed for end-to-end 3D scene transmission. Comp ared to traditional systems relying on handcrafted NeRF semanti c feature decomposition for compression and well-adaptive c hannel coding for transmission error correction, our NeRFCom empl oys a nonlinear transform and learned probabilistic models, en abling flexible variable-rate joint source-channel coding and effi cient bandwidth allocation aligned with the NeRF semantic featur e's different contribution to the 3D scene synthesis fidelity. E xperi-mental results demonstrate that NeRFCom achieves free-vie w 3D scene efficient transmission while maintaining robustness under adverse channel conditions. Index T erms --Neural radiance field (NeRF), 3D scene transmission, semantic features, nonlinear transform coding. IRTUAL reality (VR) and augmented reality (AR) construct 3D scenes to provide users with immersive experiences [ 1 ]. However, traditional 3D scene synthesis techniques often rely on manual scene modeling, and the complex workflow increases the cost of deploying 3D technologies.
UAV-Assisted Multi-Task Federated Learning with Task Knowledge Sharing
Yang, Yubo, Yang, Tao, Wu, Xiaofeng, Hu, Bo
The rapid development of Unmanned aerial vehicles (UAVs) technology has spawned a wide variety of applications, such as emergency communications, regional surveillance, and disaster relief. Due to their limited battery capacity and processing power, multiple UAVs are often required for complex tasks. In such cases, a control center is crucial for coordinating their activities, which fits well with the federated learning (FL) framework. However, conventional FL approaches often focus on a single task, ignoring the potential of training multiple related tasks simultaneously. In this paper, we propose a UAV-assisted multi-task federated learning scheme, in which data collected by multiple UAVs can be used to train multiple related tasks concurrently. The scheme facilitates the training process by sharing feature extractors across related tasks and introduces a task attention mechanism to balance task performance and encourage knowledge sharing. To provide an analytical description of training performance, the convergence analysis of the proposed scheme is performed. Additionally, the optimal bandwidth allocation for UAVs under limited bandwidth conditions is derived to minimize communication time. Meanwhile, a UAV-EV association strategy based on coalition formation game is proposed. Simulation results validate the effectiveness of the proposed scheme in enhancing multi-task performance and training speed.
Adaptive Context-Aware Multi-Path Transmission Control for VR/AR Content: A Deep Reinforcement Learning Approach
Ahmed, Shakil, Sabuj, Saifur Rahman, Khokhar, Ashfaq
These authors present a few critical features for ACMPTC to enhance applications require high bandwidth, ultra-low latency, and its performance--mainly choosing paths with low latency and consistent quality of service (QoS) to deliver seamless, immersive packet loss. It brings a DRL-based agent that can adapt its experiences [2]. Traditional network protocols like the decision to real-time network states and compute dynamic, Transmission Control Protocol (TCP) often struggle to meet optimal choices. This feedback loop, on the other hand, these stringent demands, especially in highly dynamic and allows for real-time path selection and resource allocation that diverse network environments due to single path transmission, enables continuous optimization to provide a smooth AR/VR inadequate for high-bandwidth, low-latency requirement, high experience even with varying network conditions. It confirms latency sensitivity, etc. [3]. These limitations make TCP less that the system operates correctly and provides a way to update effective for dynamic, heterogeneous network environments such a network when there is variation in traffic levels by and the demanding performance needs of modern applications adjusting it effectively.
Split Learning in Computer Vision for Semantic Segmentation Delay Minimization
Evgenidis, Nikos G., Mitsiou, Nikos A., Tegos, Sotiris A., Diamantoulakis, Panagiotis D., Karagiannidis, George K.
In this paper, we propose a novel approach to minimize the inference delay in semantic segmentation using split learning (SL), tailored to the needs of real-time computer vision (CV) applications for resource-constrained devices. Semantic segmentation is essential for applications such as autonomous vehicles and smart city infrastructure, but faces significant latency challenges due to high computational and communication loads. Traditional centralized processing methods are inefficient for such scenarios, often resulting in unacceptable inference delays. SL offers a promising alternative by partitioning deep neural networks (DNNs) between edge devices and a central server, enabling localized data processing and reducing the amount of data required for transmission. Our contribution includes the joint optimization of bandwidth allocation, cut layer selection of the edge devices' DNN, and the central server's processing resource allocation. We investigate both parallel and serial data processing scenarios and propose low-complexity heuristic solutions that maintain near-optimal performance while reducing computational requirements. Numerical results show that our approach effectively reduces inference delay, demonstrating the potential of SL for improving real-time CV applications in dynamic, resource-constrained environments.
WDMoE: Wireless Distributed Mixture of Experts for Large Language Models
Xue, Nan, Sun, Yaping, Chen, Zhiyong, Tao, Meixia, Xu, Xiaodong, Qian, Liang, Cui, Shuguang, Zhang, Wenjun, Zhang, Ping
Large Language Models (LLMs) have achieved significant success in various natural language processing tasks, but the role of wireless networks in supporting LLMs has not been thoroughly explored. In this paper, we propose a wireless distributed Mixture of Experts (WDMoE) architecture to enable collaborative deployment of LLMs across edge servers at the base station (BS) and mobile devices in wireless networks. Specifically, we decompose the MoE layer in LLMs by placing the gating network and the preceding neural network layer at BS, while distributing the expert networks among the devices. This deployment leverages the parallel inference capabilities of expert networks on mobile devices, effectively utilizing the limited computing and caching resources of these devices. Accordingly, we develop a performance metric for WDMoE-based LLMs, which accounts for both model capability and latency. To minimize the latency while maintaining accuracy, we jointly optimize expert selection and bandwidth allocation based on the performance metric. Moreover, we build a hardware testbed using NVIDIA Jetson kits to validate the effectiveness of WDMoE. Both theoretical simulations and practical hardware experiments demonstrate that the proposed method can significantly reduce the latency without compromising LLM performance.